A Data Intensive Multi-chunk Ensemble Technique to Classify Stream Data Using Map-Reduce Framework

نویسندگان

Tahseen Al-Khateeb

Mohammad Salim Ahmed

Mohammad Masud

Latifur Khan

چکیده

We propose a data intensive and distributed multichunk ensemble classifier based data mining technique to classify data streams. In our approach, we combine r most recent consecutive data chunks with data chunks in the current ensemble and generate a new ensemble using this data for training. By introducing this multi-chunk ensemble technique in a Map-Reduce framework and considering the concept-drift of the data, we significantly reduce the running time and classification error compared to different ensemble approaches. We have empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real world botnet traffic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Concept-Drifting Data Stream to Detect Peer to Peer Botnet Traffic

We propose a novel stream data classification technique to detect Peer to Peer botnet. Botnet traffic can be considered as stream data having two important properties: infinite length and drifting concept. Thus, stream data classification technique is more appealing to botnet detection than simple classification technique. However, no other botnet detection approaches so far have applied stream...

متن کامل

A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

We propose a multi-partition, multi-chunk ensemble classifier based data mining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive dat...

متن کامل

Classification of Streaming Fuzzy DEA Using Self-Organizing Map

The classification of fuzzy data is considered as the most challenging areas of data analysis and the complexity of the procedures has been obstacle to the development of new methods for fuzzy data analysis. However, there are significant advances in modeling systems in which fuzzy data are available in the field of mathematical programming. In order to exploit the results of the researches on ...

متن کامل

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Combining Classifier Guided by Semi-Supervision

The article suggests an algorithm for regular classifier ensemble methodology. The proposed methodology is based on possibilistic aggregation to classify samples. The argued method optimizes an objective function that combines environment recognition, multi-criteria aggregation term and a learning term. The optimization aims at learning backgrounds as solid clusters in subspaces of the high...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

A Data Intensive Multi-chunk Ensemble Technique to Classify Stream Data Using Map-Reduce Framework

نویسندگان

چکیده

منابع مشابه

Mining Concept-Drifting Data Stream to Detect Peer to Peer Botnet Traffic

A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams

Classification of Streaming Fuzzy DEA Using Self-Organizing Map

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Combining Classifier Guided by Semi-Supervision

عنوان ژورنال:

اشتراک گذاری